Protein biomarkers of disease progression in patients with systemic sclerosis associated interstitial lung disease

Systemic sclerosis is a rare connective tissue disease; and interstitial lung disease (SSc–ILD) is associated with significant morbidity and mortality. There are no clinical, radiologic features, nor biomarkers that identify the specific time when patients are at risk for progression at which the benefits from treatment outweigh the risks. Our study aimed to identify blood protein biomarkers associated with progression of interstitial lung disease in patients with SSc–ILD using an unbiased, high-throughput approach. We classified SSc–ILD as progressive or stable based on change in forced vital capacity over 12 months or less. We profiled serum proteins by quantitative mass spectrometry and analyzed the association between protein levels and progression of SSc–ILD via logistic regression. The proteins associated with at a p value of < 0.1 were queried in the ingenuity pathway analysis (IPA) software to identify interaction networks, signaling, and metabolic pathways. Through principal component analysis, the relationship between the top 10 principal components and progression was evaluated. Unsupervised hierarchical clustering with heatmapping was done to define unique groups. The cohort consisted of 72 patients, 32 with progressive SSc–ILD and 40 with stable disease with similar baseline characteristics. Of a total of 794 proteins, 29 were associated with disease progression. After adjusting for multiple testing, these associations did not remain significant. IPA identified five upstream regulators that targeted proteins associated with progression, as well as a canonical pathway with a higher signal in the progression group. Principal component analysis showed that the ten components with the highest Eigenvalues represented 41% of the variability of the sample. Unsupervised clustering analysis revealed no significant heterogeneity between the subjects. We identified 29 proteins associated with progressive SSc–ILD. While these associations did not remain significant after accounting for multiple testing, some of these proteins are part of pathways relevant to autoimmunity and fibrogenesis. Limitations included a small sample size and a proportion of immunosuppressant use in the cohort, which could have altered the expression of inflammatory and immunologic proteins. Future directions include a targeted evaluation of these proteins in another SSc–ILD cohort or application of this study design to a treatment naïve population.

Systemic sclerosis (scleroderma or SSc) is a rare connective tissue disease characterized by immune dysfunction, vasculopathy, cellular inflammation, and fibrosis. SSc commonly manifests in the skin, lung parenchyma, and pulmonary vasculature 1 . Cohort studies have estimated the prevalence of SSc-related interstitial lung disease (SSc-ILD) at between 30 and 50% 2,3 with 30-35% of all scleroderma-related deaths attributed to it 4 . SSc-ILD has a variable clinical course 5 . While most patients experience long periods of stability, about a third have disease progression after disease onset that is associated with higher mortality [5][6][7][8] . Given the significant side effects from the existing therapies like immunosuppressant and antifibrotic medications, current guidelines recommend active surveillance and treatment after evidence of progressive lung disease 9,10 , most commonly defined by decline in forced vital capacity (FVC) 11,12 . Risk factors that identify patients at risk for development of ILD do not predict disease progression 1 . Prior studies have determined clinical and laboratory predictors of mortality and progression of ILD, such as presence of the Scl-70 antibody, extent of disease on computed tomography, and decline in FVC and DLCO; however, these do not identify the specific time in the clinical course at which patients are at risk for progression and thus most likely to benefit from initiation of treatment 13,14 . As a result, patients often lose irrecoverable lung function prior to initiation of treatment, as progression is only realized after the fact of its occurrence.
Serum biomarkers of progressive SSc-ILD are an attractive alternative to identify patients at risk for progression given the ease of collection. Prior biomarker studies in SSc-ILD determine the risk of developing ILD but do not predict progression 13,15 , or they have focused on candidate proteins by extrapolating data from other organs affected by systemic sclerosis, such as skin [16][17][18][19] . Our study aimed to identify blood protein biomarkers associated with progression of interstitial lung disease in patients with SSc-ILD using an unbiased, high-throughput proteomic approach.

Methods
Study design and participants. In this single center, nested case-control study, we identified subjects with SSc-ILD among patients enrolled in the University of California, San Francisco Scleroderma Center registry from September 2015 to June 2020. All patients fulfilled the 2013 American College of Rheumatology/ European League Rheumatism criteria for SSc 20 . Presence of ILD was confirmed by radiographic evidence of pulmonary fibrosis on high resolution computed tomography (HRCT) in patients exhibiting forced vital capacity (FVC) < 80% or a greater than 10% decline in FVC (L) over twelve months. Longitudinal pulmonary function testing was used to classify patients having stable SSc-ILD or progressive SSc-ILD: progressive disease cases were identified by a decline in absolute FVC of 5% or more over a 12-month period, whereas stable disease controls were defined as less than 5% change in absolute FVC over 12 months. In cases where patients had multiple sets of PFT measurements that met our definition of progressive disease, we utilized the first set of PFTs that met the definition. The threshold of 5% was chosen based on prior studies in idiopathic pulmonary fibrosis and SSc-ILD in which a decline of FVC of over 5% was associated with higher mortality 7,21-23 , and an analysis from the two largest SSc-ILD clinical trials, which showed that a minimally clinically important difference correlated with an FVC decline of 3% 24 . Cases (SSc-ILD progressors) and controls (SSc-ILD) were matched for sex at birth, age, BMI, and use of immunosuppression at the time of entering the study. All patients included in the analysis had a blood sample available within 3 months of the pre-progression pulmonary function test (PFT) used for classification.
Specimen processing and liquid chromatography with dual mass spectrometry (LC-MS/MS) analysis. Biobanked serum specimens were utilized for the proteomics study. These samples were collected within three months of disease progression on pulmonary function tests. Peripheral blood was collected from consenting patients at the time of clinical visits. Serum was isolated, aliquoted, and stored at − 80 °C until the time of this study.
Serum samples were depleted of the 14 most abundant proteins using the High Select Top 14 Abundant Protein Depletion Camel Antibody Resin (Thermo Fisher Scientific) to enhance the detection and identification of the less abundant proteins and analyzed as outlined by McArdle et al. 25 . Samples subsequently underwent serum trypsin digestion and desalting prior to tandem liquid chromatography-mass spectrometry analysis. Dataindependent acquisition (DIA) analysis was performed on an Orbitrap Exploris 480 (Thermo Fischer Scientific) mass spectrometer interfaced with a microflow-nanospray electrospray ionization source (Newomics, IS-T01) coupled to an Ultimate 3000 ultra-high-pressure chromatography system. Peptides were loaded at 4 µg, based on average bicinchoninic acid assay (BCA) results, and separated on a gradient of 7-10% B organic phase for 2 min, 10-32% B for 53 min, 32-70% for 1 min, 70-70% for 2.5 min, and then decreased from 70% B to 1% B www.nature.com/scientificreports/ over 1 min, on a C18 column (15 cm length, 300 µm ID, 100 Å pore size, Phenomenex) over the course of 60 min with a constant flow rate of 9.5 µL/min. Mass range of 400-1000 and automated gain control (AGC) target value for fragment spectra of 300% was used. Peptide ions were fragmented at a normalized collision energy of 30%. Fragmented ions were detected across 50 non-overlapping data independent acquisition precursor windows of size 12 Da. MS2 Resolution was set to 15,000 with an ion transmission time of 22 ms. All data is acquired in profile mode using positive polarity. Protein estimates were normalized across batches to mitigate systematic error.
Statistical analysis. Baseline demographics were described using means, standard deviations, and proportions. For proteins with quantifiable levels in over 50% of the samples, samples with values below the lower level of detection were imputed as one less than the lowest detected value. The protein values were log 10 transformed for normalization, which was confirmed visually with histograms. When protein levels were detected in less than 50% of the samples, the values were transformed to a dichotomous variable to indicate detected versus not detected.
The association between continuously modelled, log-transformed biomarkers and progression of SSc-ILD was assessed through logistic regression. For proteins that were dichotomized, chi-squared testing was used. The false discovery rate was calculated using the Benjamini Hochberg method.
Principal component analysis was performed to reduce the dimensionality of the protein biomarker set. The relationship between the top 10 principal components and progression was evaluated using logistic regression. Additionally, unsupervised hierarchical clustering by Ward linkage with heatmapping was done to define unique groups.
Least absolute shrinkage and selection operator (LASSO) logistic regression test was performed to construct a protein model for prediction of progressive SSc-ILD. The area under the curve (AUC) for this model was computed to summarize the predictive utility for this combination of biomarkers. A permutation test framework was used to estimate an overall empiric p value for the AUC 26 .
Proteins associated with progression of disease at a p value of < 0.1 were queried in the QIAGEN Ingenuity Pathway Analysis (IPA) software to identify interaction networks, signaling, and metabolic pathways. The fold change of each protein using the progressive group as the denominator was calculated. The upstream regulator with a predictive activation state or canonical pathways detected with an absolute Z score ≥ 2 were reported.
Additional analyses included logistic regression models to determine the association between proteins and progression of FVC. In addition to unadjusted logistic regression, multivariable regression models adjusting for clinical variables of race, smoking status, systemic sclerosis subtype, presence of pulmonary hypertension, anti-Scl-70 and ACA antibody status, pre-progression FVC (L), and pre-progression DLCO adjusted for hemoglobin (cc/sec/mmHg) were also performed. StataSE 17 software was used for statistical analysis.
Informed consent was obtained from all subjects and all methods and samples were collected and stored per the Institution Review Board approval at the University of California San Francisco. All methods were carried out in accordance with relevant guidelines and regulations.

Ethical approval. This study was approved by the Internal Review Board of the University of California San
Francisco.

Results
The cohort consisted of 72 patients, 32 of whom had progressive SSc-ILD (cases) and 40 who had stable disease (controls). Baseline characteristics are described in Table 1. In both groups, most patients were women (68.75% and 65%, respectively) and of white race (62.5%). The mean age was 51.6 years and mean BMI was 24 kg/m 2 . The proportion of patients with anti-Scl70 antibody, ACA antibody positivity, and immunosuppression use was similar between groups. There was a higher percentage of never-smokers in the progressive group. Preprogression absolute and percent-predicted values of FVC and DLCO were lower in the progressive SSc-ILD group. The median percent change in FVC over one year was -18% in the progressive group compared to 4% in the control group. In a multivariable logistic regression model using the clinical variables of race, smoking status, SSc subtype, presence of pulmonary hypertension, anti-Scl-70, and ACA antibody status, pre-progression FVC and pre-progression DLCO were not independently associated with progression of disease.
A total of 794 proteins were detected. Of these, 285 proteins had quantified values for all patients. There were 367 proteins for which quantified values were detected for greater than 50% of subjects and for which values below the level of detection were imputed. There were 142 proteins for which greater than 50% of patients' values were undetected and were thus treated as dichotomous values (detected/undetected).
On unadjusted logistic regression, 29 proteins exhibited an association with ILD progression at greater than 0.05 significance level (Table 2). After adjusting for multiple testing, these associations did not remain statistically significant. Additionally, there were no significant associations between pre-progression FVC and protein levels after adjusting for multiple testing. When adjusted for clinical variables, there were no significant associations between protein levels and progression of disease or pre-progression FVC.
Principal component analysis revealed 72 components, and the ten components with the highest Eigenvalues represented explained 41% of the variability of the sample. There was no significant association between each of these ten components and progression of disease (Table 3). A scatterplot of the two principal components with the highest Eigenvalues shows no differentiation between groups (Fig. 1). Unsupervised clustering analysis revealed no significant heterogeneity between the subjects (Fig. 2).
A LASSO model identified the DBNL (Drebin like protein) and RIN3 (RIN3 protein, found in many immune cells) proteins, as well as an uncharacterized protein as potential candidates for predictors of progression (  27,28 . Additionally, DBNL, a cytoskeletal protein is involved in T cell signaling and activation 29 . The area under the curve for this model was 0.839 (predicted AUC 0.937), and the empiric p-value for the model was 0.3. All proteins with a p value of < 0.1 were queried in the QIAGEN Ingenuity Pathway Analysis. There were five upstream regulators that targeted molecules in the dataset with an absolute Z score ≥ 2. Of these, three predicted an activated state in the progressive group: rosiglitazone (Z-score 2.173), PKD1 (Z-score 2) and phytohemagglutinin (Z-score 2). Two regulators predicted an inhibited state in the progressive group: EDN1 (Z-score − 2.183) and beta-estradiol (Z score − 2.309). The canonical pathway of liver X receptor/retinoid receptor (LXR/RXR) activation, associated with inflammation and lipid metabolism, had a higher signal in the progression group (Z-score 2.121).

Discussion
In this study, we identified 29 proteins that were associated with progressive SSc-ILD. While these associations did not remain statistically significant after accounting for multiple testing, these the proteins were found to be part of pathways relevant to autoimmunity and fibrogenesis. We found that the LXR/RXR pathway, which is associated with inhibition of the expression of interleukin-6, cyclooxygenase-2, and nitric oxide synthase 30 , was highly expressed in the progressive group. It suggests that the disease signature is constant between stable and progressors but only that LXR/RXR is activated involved in NO and macrophage signaling in the latter group 31 . Rosiglitazone, a PPAR-γ agonist, has been studied as an attenuator in scleroderma lung fibroblasts [32][33][34] . The rosiglitazone pathway was more significantly activated in the progressor group. Additionally, the beta-estradiol pathway was more inhibited in the progressor group, which was not explained by sex differences between the two groups. Given prior data showing that men with SSc-ILD have increased ILD severity and higher mortality compared to women 35,36 , this could signal a role of sex hormones in the pathophysiology of fibrotic lung disease.
Currently, there is an unmet need to identify patients at risk for progression of SSc-ILD to optimize timing of treatment initiation. To date, there are no such indicators, and progression is only realized after the fact with decrements in FVC or radiologic findings. Given the ease of collection, blood biomarkers are desirable relative to other sites (e.g. bronchoalveolar lavage fluid or biopsied lung tissue). Prior biomarker studies have focused on establishing the risk for developing ILD but not predicting ILD progression in established disease 37,38  www.nature.com/scientificreports/ study design, in which we sought to identify a biomarker predicts progression of disease and aids in the crucial clinical decision of when to start therapy for SSc-ILD, is a strength of our study. We utilized blood collected around the time of FVC decline to maximize the likelihood that protein alterations coincided with progression of lung disease rather than abnormalities in other organs affected by systemic sclerosis. Furthermore, our SSc-ILD cohort is well-phenotyped and had longitudinal data to determine disease trajectory. As such, we were able to compare two distinct, clinically relevant trajectories of SSc-ILD. Sample size was the major limitation of our study, as we had a small number of subjects relative to the large number of proteins queried. Thus, we may have lacked the power to detect statistically significant differences. Similarly, imputation of data for undetected proteins to one less than minimum value may have narrowed the differences in protein levels between the two groups, which would favor the null hypothesis. Additionally, a high  www.nature.com/scientificreports/ percentage of patients in both groups were receiving immunosuppressive agents, which could have altered the expression of inflammatory and immunologic proteins. Consequently, the results may not be representative of treatment naïve patients who may have higher differences in these protein levels or differential protein expression altogether from already-treated patients.
There are several general challenges to the discovery of biomarkers in this setting. Unlike other interstitial lung diseases such as idiopathic pulmonary fibrosis or chronic hypersensitivity pneumonitis, SSc involves multiple organ systems, and given the complexity of the disease, concurrent organ involvement may create greater heterogeneity in circulating molecular markers that makes identifying a protein uniquely associated with ILD more challenging. While an unbiased approach can identify novel proteins of interest, the multiple testing that results necessitates a large sample size (or dramatic differences in protein expression between groups) to reach statistical significance. Given the relative rarity of SSc-ILD, sample size is often limited.
In summary, we sought to identify biomarkers predictive of progressive SSc-ILD. We identified 29 proteins involved in the pathways of the inflammatory and fibrotic cascade that could be associated with disease progression, however, after adjusting for multiple testing these associations did not remain significant. Given that some of these proteins have been found to play a role in inflammatory and fibrotic pathways, future directions may www.nature.com/scientificreports/ include a targeted evaluation of these 29 proteins as predictors in another SSc-ILD cohort, as well as application of this study design to a treatment naïve population.

Data availability
All data generated or analyzed during this study are included in this published article (and its Supplementary Information files).